Art Classification Using Neural Networks¶

Submission for MIE1517 - Introduction to Deep Learning

Team 7

Taeyeon Kim, Sameeksha Naik, Kexin Qin

Dec 8. 2023

The link to this Google Colab file is below:

https://drive.google.com/file/d/1up_qnnzx-mobJznSdCRdtR-2jjF7abns/view?usp=sharing

1 - Introduction¶

This project explores how convolutional neural networks may be useful in classifying art styles across different time periods and cultures. This would be a useful tool for art historians as they valuate and authenticate pieces of art.

The WikiArt dataset on Kaggle was used as a starting point. This dataset features 80k images in 27 different styles. To scale down computation, five art styles were selected, taking into consideration their time period and similarity to other styles. 300-400 images of each style were selected and a custom data was created and is hosted on Google Drive with this link. The dataset has 1669 images.

art_cnn.svg

2 - Data Processing and Loading¶

2.1 - File Structure¶

The custom dataset is organized as follows:

Project_Data/
├─ Baroque/
│  ├─ annibale-carracci_two-children-teasing-a-cat-1590.jpg
│  ├─ annibale-carracci_venus-adonis-and-cupid-1590.jpg
│  ├─ ...
├─ Cubism/
├─ Minimalism/
├─ Popart/
├─ Ukiyo/

Using the split-folders package with the ImageFolder class that is native to PyTorch, the training, validation, and testing datasets can be split 60%, 20%, and 20%. The resulting file structure will look as follows:

my_dataset/
├─ train/
│  ├─ Baroque
│  ├─ Cubism
│  ├─ Minimalism
|  ├─ Popart
|  ├─ Ukiyo
├─ val/
│  ├─ Baroque
│  ├─ Cubism
│  ├─ ...
├─ test/

Since the model will only work when the images are the same size, the input are all transformed to be 224 x 224 pixels. The image are simply resized, not cropped or padded. This means the aspect ratio is altered.

In [ ]:
# import necessary libraries
import torch
import torchvision
import torchvision.transforms as transforms
import torchvision.models
from google.colab import drive
import matplotlib.pyplot as plt
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import time
from torchvision import datasets, transforms
import seaborn as sns
from sklearn.metrics import f1_score, precision_recall_fscore_support, confusion_matrix
from matplotlib.table import Table
import pandas as pd
In [ ]:
pip install split-folders
Collecting split-folders
  Downloading split_folders-0.5.1-py3-none-any.whl (8.4 kB)
Installing collected packages: split-folders
Successfully installed split-folders-0.5.1
In [ ]:
from google.colab import drive
drive.mount('/content/drive')

import splitfolders

# Define the train/validation/test ratio
splitfolders.ratio("/content/drive/MyDrive/Colab Notebooks/Project_Data",
                   output="my_dataset", seed=1, ratio=(.6, .2, .2), group_prefix=None)

data_transform = transforms.Compose( [transforms.Resize((224,224)),
                                      transforms.ToTensor()])

# Create the three data sets
train_data = datasets.ImageFolder('my_dataset/train', transform=data_transform)
validation_data = datasets.ImageFolder('my_dataset/val', transform=data_transform)
test_data = datasets.ImageFolder('my_dataset/test', transform=data_transform)
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Copying files: 1669 files [00:09, 183.62 files/s]

3 - Helper Functions¶

We have defined the standard train and get_accuracy functions, which will help us compute the forward pass, backward pass, and the accuracies for each epoch. After each epoch, the accuracies and loss are recorded so that they can be plotted after the training is complete.

In [ ]:
def train(model, train_data, val_data, batch_size=64, learning_rate =0.01, num_epochs=1):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

    train_loader=torch.utils.data.DataLoader(train_data, batch_size=30, shuffle=True)
    val_loader=torch.utils.data.DataLoader(val_data, batch_size=30, shuffle=True)

    iters, losses, train_acc, val_acc = [], [], [], []

    # training
    for epoch in range(num_epochs):
        for imgs, labels in iter(train_loader):

            #############################################
            #To Enable GPU Usage
            if use_cuda and torch.cuda.is_available():
              imgs = imgs.cuda()
              labels = labels.cuda()
            #############################################

            out = model(imgs)             # forward pass
            loss = criterion(out, labels) # compute the total loss
            loss.backward()               # backward pass (compute parameter updates)
            optimizer.step()              # make the updates for each parameter
            optimizer.zero_grad()         # a clean up step for PyTorch

            # save the current training information
        iters.append(epoch)
        losses.append(float(loss)/batch_size)             # compute *average* loss
        train_accuracy=get_accuracy(model,train_loader)
        val_accuracy=get_accuracy(model,val_loader)
        print('Epoch:',epoch,'Train Accuracy:',train_accuracy,'Validation Accuracy:',val_accuracy)
        train_acc.append(get_accuracy(model, train_loader)) # compute training accuracy
        val_acc.append(get_accuracy(model, val_loader))  # compute validation accuracy


    # plotting
    plt.title("Training Curve")
    plt.plot(iters, losses, label="Train")
    plt.xlabel("Iterations")
    plt.ylabel("Loss")
    plt.show()

    plt.title("Training Curve")
    plt.plot(iters, train_acc, label="Train")
    plt.plot(iters, val_acc, label="Validation")
    plt.xlabel("Iterations")
    plt.ylabel("Training Accuracy")
    plt.legend(loc='best')
    plt.show()

    print("Final Training Accuracy: {}".format(train_acc[-1]))
    print("Final Validation Accuracy: {}".format(val_acc[-1]))
In [ ]:
def get_accuracy(model, data_loader):
    correct = 0
    total = 0
    for imgs, labels in data_loader:

        #############################################
        #To Enable GPU Usage
        if use_cuda and torch.cuda.is_available():
          imgs = imgs.cuda()
          labels = labels.cuda()
        #############################################

        output = model(imgs)

        #select index with maximum prediction score
        pred = output.max(1, keepdim=True)[1]
        correct += pred.eq(labels.view_as(pred)).sum().item()
        total += imgs.shape[0]
    return correct / total

4 - Inital CNN Network¶

4.1 - Model Architecture¶

The first strategy was to develop a custom CNN network for feature extraction and classification. The following model has two convolutional layers and two fully connected layers. Between convolutional layers, max-pooling was applied. The figure below summarizes the architecture.orig_cnn.svg

In [ ]:
class CNN(nn.Module):
  def __init__(self):
    super(CNN,self).__init__()
    self.conv1=nn.Conv2d(3,5,5) # in channels, out channels, kernel size
    self.pool=nn.MaxPool2d(2,2) # kernel size, stride
    self.conv2=nn.Conv2d(5,10,5) # in channels, out channels, kernel size
    self.fc1=nn.Linear(10*53*53,32) # 1st fully connected layer
    self.fc2=nn.Linear(32,5) # 2nd fully connected layer

  def forward(self, x): #x is the input
    x=self.pool(F.relu(self.conv1(x))) # 1st convolution layer + ReLU + maxpooling
    x=self.pool(F.relu(self.conv2(x))) # 2nd convolution layer + ReLU + maxpooling
    x=x.view(-1,10*53*53) # flattens the feature maps
    x=torch.relu(self.fc1(x)) # 1st fully connected layer + ReLU
    x=self.fc2(x) # 2nd fully connected layer
    return x

4.2 - Custom CNN Results¶

We'll now train the first CNN model with the data. The hyperparameters are as follows:

batch size: 64
learning rate: 0.005
number of epochs: 8
In [ ]:
use_cuda=True
model=CNN()

if use_cuda and torch.cuda.is_available():
  model=model.cuda()
  print('CUDA is available. Training on GPU')

else:
  print('CUDA is not available. Training on CPU')

start_time = time.time()

train(model, train_data, validation_data, batch_size=64, learning_rate=0.005, num_epochs=8)

end_time_CNN_1 = time.time() - start_time

# print("The CNN takes ", end_time_CNN_1," seconds to train")
CUDA is available. Training on GPU
Epoch: 0 Train Accuracy: 0.244 Validation Accuracy: 0.22289156626506024
Epoch: 1 Train Accuracy: 0.368 Validation Accuracy: 0.3493975903614458
Epoch: 2 Train Accuracy: 0.455 Validation Accuracy: 0.42771084337349397
Epoch: 3 Train Accuracy: 0.491 Validation Accuracy: 0.45180722891566266
Epoch: 4 Train Accuracy: 0.474 Validation Accuracy: 0.4427710843373494
Epoch: 5 Train Accuracy: 0.394 Validation Accuracy: 0.3825301204819277
Epoch: 6 Train Accuracy: 0.556 Validation Accuracy: 0.49698795180722893
Epoch: 7 Train Accuracy: 0.479 Validation Accuracy: 0.39457831325301207
Final Training Accuracy: 0.494
Final Validation Accuracy: 0.4819277108433735

The final training accuracy is about 49%, and the validation accuracy is 48%. While this indicates that the model is trainable to this particular dataset, further improvements can be made.

This prompted the use of Transfer Learning with AlexNet. The pretrained weights may noticeably improve the results. Should this be the case, expanding the project to add more classes is a possibility.

5 - AlexNet Training and Results¶

First, our image data is transformed by resizing to 224x224 pixels and converted into tensors. We iterate through the training, validation and test datasets to extract features using the pretrained AlexNet model's convolutional layers. These extracted features (tensors) are saved locally. This avoids re-computing features each time the model is ran. which helps to free up GPU memory and save on training and evaluating time.

In [ ]:
# TRANSFER LEARNING with ALEXNET

import torchvision.models
alexnet = torchvision.models.alexnet(pretrained=True) # weights
In [ ]:
import numpy as np

np.random.seed(1000)
torch.manual_seed(1000)
Out[ ]:
<torch._C.Generator at 0x79ea5f705530>
In [ ]:
import os
import splitfolders
from google.colab import drive
drive.mount('/content/drive')

splitfolders.ratio("/content/drive/MyDrive/Colab Notebooks/Project_Data", output="my_dataset", seed=1, ratio=(.6, .2, .2), group_prefix=None)

from torchvision import datasets, transforms

data_transform = transforms.Compose( [transforms.Resize((224,224)),
                                      transforms.ToTensor()])

train_data = datasets.ImageFolder('my_dataset/train', transform=data_transform)
validation_data = datasets.ImageFolder('my_dataset/val', transform=data_transform)
test_data = datasets.ImageFolder('my_dataset/test', transform=data_transform)

batch_size = 1
num_workers = 0
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
                                           num_workers=num_workers, shuffle=True)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=batch_size,
                                           num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
                                           num_workers=num_workers, shuffle=True)

classes=['Baroque','Cubism','Minimalism','Popart','Ukiyo']

#training
i = 0
for img, label in train_loader:
  features = alexnet.features(img)
  features_tensor = torch.from_numpy(features.detach().numpy())

  save_features_dir = '/content/my_dataset/train/'+ str(classes[label]) + '/'
  if not os.path.isdir(save_features_dir):
    os.mkdir(save_features_dir)
  torch.save(features_tensor.squeeze(0), save_features_dir + str(i) + '.tensor')
  i += 1

#validation
i = 0
for img, label in val_loader:
  features = alexnet.features(img)
  features_tensor = torch.from_numpy(features.detach().numpy())

  save_features_dir = '/content/my_dataset/val/'+ str(classes[label]) + '/'
  if not os.path.isdir(save_features_dir):
    os.mkdir(save_features_dir)
  torch.save(features_tensor.squeeze(0), save_features_dir + str(i) + '.tensor')
  i += 1

#testing
i = 0
for img, label in test_loader:
  features = alexnet.features(img)
  features_tensor = torch.from_numpy(features.detach().numpy())

  save_features_dir = '/content/my_dataset/test/'+ str(classes[label]) + '/'
  if not os.path.isdir(save_features_dir):
    os.mkdir(save_features_dir)
  torch.save(features_tensor.squeeze(0), save_features_dir + str(i) + '.tensor')
  i += 1
Mounted at /content/drive
Copying files: 1669 files [01:05, 25.50 files/s]

To verify the size of the input to the model, the size of the feature tensor was printed. The feature tensors have 256 number of channels and a size of 6x6. The CNN models will need to also have in_channels of 256 to match the feature tensor size. Due to the 6x6 size of the tensors, there are limitations in the complexity of the architecture. To elaborate:

  • If the spatial dimensions are small, the model may struggle to capture complex patterns or relationships within the input data as it has less spatial context to work with. Each convolutional filter operates on a smaller spatial area in subsequent layers which limits the model.
In [ ]:
print(features.size())
torch.Size([1, 256, 6, 6])

The class_to_idx method was called on the data loader to verify the indices assigned to each class. This will be used later when implmenting the model on new data.

In [ ]:
print(train_data.class_to_idx)
{'Baroque': 0, 'Cubism': 1, 'Minimalism': 2, 'Popart': 3, 'Ukiyo': 4}

5.1 - Model Architecture¶

We created a CNN model that takes AlexNet features and consists of two convolutional layers, one pooling layer, and two fully connected layers. The loss we are trying to minimize is multi-class cross entropy loss as shown by the equation. $$ L(y, p) = - \sum_{i=1}^{N} y_i \log(p_i) $$ The last fully connected layer maps the features into 5 outputs, corresponding to the number of art styles we have. This choice of model architecture allows us to classify images into multiple categories thus allowing us to classify different art styles. Lastly, we tuned our hyperparameters. We tried out different combinations of batch size, learning rate, and the number of epochs during training to balance between training time and accuracy, while preventing overfitting.

We will test three different CNN models and select the model with the highest accuracy. First, we are using alexnet_CNN which has the architecture shown in the image below. It has a total of 33,491 parameters.

first_alex_net.svg

In [ ]:
class alexnet_CNN(nn.Module):
    def __init__(self):
        super(alexnet_CNN, self).__init__()
        self.conv1 = nn.Conv2d(256, 80, 3) #in_channels, out_chanels, kernel_size
        self.conv2 = nn.Conv2d(80, 15, 3) #in_channels, out_chanels, kernel_size
        self.fc1 = nn.Linear((15 * 2 * 2), 32)
        self.fc2 = nn.Linear(32, 5) # 5 is number of classes

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = x.view(-1, 15 * 2 * 2)#
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x
In [ ]:
train_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/train/', loader=torch.load, extensions=('.tensor'))
val_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/val/', loader=torch.load, extensions=('.tensor'))
test_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/test/', loader=torch.load, extensions=('.tensor'))

use_cuda=True
model_an = alexnet_CNN()

if use_cuda and torch.cuda.is_available():
  model_an = model_an.cuda()
  print('CUDA is available. Training on GPU')

else:
  print('CUDA is not available. Training on CPU')
start_time=time.time()
train(model_an, train_data_features,val_data_features, batch_size=16, learning_rate=0.001,num_epochs=15)
end_time=time.time()
total_time_CNN_2=end_time-start_time
print('The CNN takes',total_time_CNN_2,'seconds to train')
CUDA is available. Training on GPU
Epoch: 0 Train Accuracy: 0.329 Validation Accuracy: 0.3253012048192771
Epoch: 1 Train Accuracy: 0.355 Validation Accuracy: 0.3373493975903614
Epoch: 2 Train Accuracy: 0.431 Validation Accuracy: 0.4246987951807229
Epoch: 3 Train Accuracy: 0.554 Validation Accuracy: 0.5
Epoch: 4 Train Accuracy: 0.714 Validation Accuracy: 0.6746987951807228
Epoch: 5 Train Accuracy: 0.796 Validation Accuracy: 0.7379518072289156
Epoch: 6 Train Accuracy: 0.831 Validation Accuracy: 0.7439759036144579
Epoch: 7 Train Accuracy: 0.856 Validation Accuracy: 0.7891566265060241
Epoch: 8 Train Accuracy: 0.869 Validation Accuracy: 0.7981927710843374
Epoch: 9 Train Accuracy: 0.881 Validation Accuracy: 0.8042168674698795
Epoch: 10 Train Accuracy: 0.894 Validation Accuracy: 0.8192771084337349
Epoch: 11 Train Accuracy: 0.897 Validation Accuracy: 0.8072289156626506
Epoch: 12 Train Accuracy: 0.922 Validation Accuracy: 0.8253012048192772
Epoch: 13 Train Accuracy: 0.893 Validation Accuracy: 0.8283132530120482
Epoch: 14 Train Accuracy: 0.939 Validation Accuracy: 0.8343373493975904
Final Training Accuracy: 0.939
Final Validation Accuracy: 0.8343373493975904
The CNN takes 30.856274127960205 seconds to train

alexnet_CNN overfits to the training data as shown in the training curve above. The final training accuracy is 0.956 while the final validation accuracy is 0.846 and plateaus after around 10 epochs. To address this overfitting, we will try OtherAlexCNN which has more parameters of 225,657 vs. 33,491. The architecture is illustrated below.

other_Alexnet.svg

In [ ]:
class OtherAlexCNN(nn.Module):
  def __init__(self, kernel_size=3):
      super(OtherAlexCNN, self).__init__()
      self.conv1 = nn.Conv2d(256, 100, kernel_size) #in_channels, out_chanels, kernel_size
      self.pool = nn.MaxPool2d(2, 2) #kernel_size, stride
      self.conv2 = nn.Conv2d(100, 50, kernel_size) #in_channels, out_chanels, kernel_size
      self.fc1 = nn.Linear(50*2*2, 32)
      self.fc2 = nn.Linear(32, 5)

  def forward(self, x):
      x = F.relu(self.conv1(x))
      x = F.relu(self.conv2(x))
      x = x.view(-1, 200)
      x = F.relu(self.fc1(x))
      x = self.fc2(x)
      x = x.squeeze(1)
      return x
In [ ]:
train_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/train/', loader=torch.load, extensions=('.tensor'))
val_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/val/', loader=torch.load, extensions=('.tensor'))
test_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/test/', loader=torch.load, extensions=('.tensor'))

use_cuda=True
model_can=OtherAlexCNN()

if use_cuda and torch.cuda.is_available():
  model_can=model_can.cuda()
  print('CUDA is available. Training on GPU')

else:
  print('CUDA is not available. Training on CPU')
start_time=time.time()
train(model_can, train_data_features,val_data_features, batch_size=16, learning_rate=0.001,num_epochs=15)
end_time=time.time()
total_time_CNN_3=end_time-start_time
print('The CNN takes',total_time_CNN_3,'seconds to train')
CUDA is available. Training on GPU
Epoch: 0 Train Accuracy: 0.284 Validation Accuracy: 0.2680722891566265
Epoch: 1 Train Accuracy: 0.326 Validation Accuracy: 0.26506024096385544
Epoch: 2 Train Accuracy: 0.531 Validation Accuracy: 0.5180722891566265
Epoch: 3 Train Accuracy: 0.675 Validation Accuracy: 0.6295180722891566
Epoch: 4 Train Accuracy: 0.754 Validation Accuracy: 0.7018072289156626
Epoch: 5 Train Accuracy: 0.8 Validation Accuracy: 0.75
Epoch: 6 Train Accuracy: 0.814 Validation Accuracy: 0.7620481927710844
Epoch: 7 Train Accuracy: 0.86 Validation Accuracy: 0.7921686746987951
Epoch: 8 Train Accuracy: 0.863 Validation Accuracy: 0.8012048192771084
Epoch: 9 Train Accuracy: 0.878 Validation Accuracy: 0.8192771084337349
Epoch: 10 Train Accuracy: 0.892 Validation Accuracy: 0.8313253012048193
Epoch: 11 Train Accuracy: 0.904 Validation Accuracy: 0.822289156626506
Epoch: 12 Train Accuracy: 0.902 Validation Accuracy: 0.822289156626506
Epoch: 13 Train Accuracy: 0.923 Validation Accuracy: 0.8162650602409639
Epoch: 14 Train Accuracy: 0.934 Validation Accuracy: 0.8162650602409639
Final Training Accuracy: 0.934
Final Validation Accuracy: 0.8162650602409639
The CNN takes 16.094091653823853 seconds to train

The model OtherAlexCNN is able to achieve a higher validation accuracy result of 0.837 but it is still overfitting to the training data as it plateaus around 10 epochs. To address this overfitting we will try using drop out layers.

After trying different drop out probabilities, we found that using a drop out probability of 0.2, batch_size of 64, learning_rate of 0.001 and num_epochs of 15 resulted in the best results. The figure below shows the dropout architecture.dropout_alex_net.svg

In [ ]:
class AlexCNN_dropout(nn.Module):
  def __init__(self, kernel_size=3,dropout_prob=0.2):
      super(AlexCNN_dropout, self).__init__()
      self.conv1 = nn.Conv2d(256, 100, kernel_size) #in_channels, out_chanels, kernel_size
      self.pool = nn.MaxPool2d(2, 2) #kernel_size, stride
      self.conv2 = nn.Conv2d(100, 100, kernel_size) #in_channels, out_chanels, kernel_size
      self.fc1 = nn.Linear(100*2*2, 32)
      self.dropout = nn.Dropout(p=dropout_prob)
      self.fc2 = nn.Linear(32, 5)

  def forward(self, x):
      x = F.relu(self.conv1(x))
      x = self.dropout(x)
      x = F.relu(self.conv2(x))
      x = self.dropout(x)
      x = x.view(-1, 100 * 2 * 2)
      x = F.relu(self.fc1(x))
      x = self.dropout(x)
      x = self.fc2(x)
      x = x.squeeze(1)
      return x
In [ ]:
train_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/train/', loader=torch.load, extensions=('.tensor'))
val_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/val/', loader=torch.load, extensions=('.tensor'))
test_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/test/', loader=torch.load, extensions=('.tensor'))

use_cuda=True
model_d=AlexCNN_dropout()

if use_cuda and torch.cuda.is_available():
  model_d=model_d.cuda()
  print('CUDA is available. Training on GPU')

else:
  print('CUDA is not available. Training on CPU')

start_time=time.time()
train(model_d, train_data_features,val_data_features, batch_size=64, learning_rate=0.001,num_epochs=15)
end_time=time.time()
total_time_d=end_time-start_time
CUDA is available. Training on GPU
Epoch: 0 Train Accuracy: 0.257 Validation Accuracy: 0.2469879518072289
Epoch: 1 Train Accuracy: 0.264 Validation Accuracy: 0.2680722891566265
Epoch: 2 Train Accuracy: 0.378 Validation Accuracy: 0.3674698795180723
Epoch: 3 Train Accuracy: 0.581 Validation Accuracy: 0.5572289156626506
Epoch: 4 Train Accuracy: 0.669 Validation Accuracy: 0.608433734939759
Epoch: 5 Train Accuracy: 0.708 Validation Accuracy: 0.6596385542168675
Epoch: 6 Train Accuracy: 0.747 Validation Accuracy: 0.6987951807228916
Epoch: 7 Train Accuracy: 0.786 Validation Accuracy: 0.7259036144578314
Epoch: 8 Train Accuracy: 0.799 Validation Accuracy: 0.7560240963855421
Epoch: 9 Train Accuracy: 0.838 Validation Accuracy: 0.7620481927710844
Epoch: 10 Train Accuracy: 0.806 Validation Accuracy: 0.7891566265060241
Epoch: 11 Train Accuracy: 0.859 Validation Accuracy: 0.8162650602409639
Epoch: 12 Train Accuracy: 0.878 Validation Accuracy: 0.8192771084337349
Epoch: 13 Train Accuracy: 0.891 Validation Accuracy: 0.8102409638554217
Epoch: 14 Train Accuracy: 0.894 Validation Accuracy: 0.8192771084337349
Final Training Accuracy: 0.906
Final Validation Accuracy: 0.8283132530120482

Using drop out layers definitely addresses the overfitting issue compared to the previous models that had larger gaps in the training and validation accuracies. The final training accuracy is 0.917 and the validation accuracy is 0.843. However it is a tradeoff as the final validation accuracy is lower.

5.2 - Testing on the Best Model¶

We picked the best model that gave the highest validation accuracy results from the various CNN models that we have tried. Below is a table showing a summary of the various CNN models. alexnet-CNN resulted in the best results and was used to test on the test dataset.

In [ ]:
import matplotlib.pyplot as plt
import pandas as pd

models_data = [
    {"Model": "alexnet_CNN", "Train Acc": 0.939, "Val Acc": 0.834, "Epochs": 15, "Learning Rate": 0.001, "Batch Size": 16, "Parameters": 33491},
    {"Model": "OtherAlexCNN", "Train Acc": 0.934, "Val Acc": 0.816, "Epochs": 15, "Learning Rate": 0.001, "Batch Size": 16, "Parameters": 225657},
    {"Model": "AlexCNN_dropout", "Train Acc": 0.906, "Val Acc": 0.828, "Epochs": 15, "Learning Rate": 0.001, "Batch Size": 64, "Parameters": 207857},
]

df = pd.DataFrame(models_data)

fig, ax = plt.subplots(figsize=(10, 5))
ax.axis('off')
table = ax.table(cellText=df.values, colLabels=df.columns, loc='center', cellLoc='center', colColours=["#ADD8E6"] * len(df.columns))

table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1.5, 2.5)

plt.show()
In [ ]:
# test alex net on test dataset

test_loader=torch.utils.data.DataLoader(test_data_features, batch_size=64, num_workers=num_workers,shuffle=True)
get_accuracy(model_an, test_loader)
Out[ ]:
0.8367952522255193

The test accuracy obtained is 83.7 %. We will further analyze our test results below.

5.3 - Analysis of Results¶

5.3.1 - Running Time¶

Comparing the running times of all of the different models that we tried are summarized in a table below. CNN using generated features is much slower at 1000 seconds compared to using a CNN model with pretrained AlexNet features of around 16 seconds. This shows the computational benefits of using pretrained features.

In [ ]:
models = ['CNN', 'alexnet_CNN', 'OtherAlexCNN', 'AlexCNN_dropout']
running_times = [end_time_CNN_1, total_time_CNN_2, total_time_CNN_3, total_time_d]
running_times_rounded = [round(time, 3) for time in running_times]

# Create a DataFrame
df_running_times = pd.DataFrame({"Model": models, "Running Time (seconds)": running_times_rounded})

# Display the table using Matplotlib
fig, ax = plt.subplots(figsize=(8, 4))
ax.axis('off')

table = ax.table(cellText=df_running_times.values,
                 colLabels=df_running_times.columns,
                 loc='center',
                 cellLoc='center',
                 colColours=["#ADD8E6"] * len(df_running_times.columns),
                 cellColours=[["w"] * len(df_running_times.columns) for _ in range(len(df_running_times))])

table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1.5, 2.5)

plt.show()

5.3.2 - Visualizing the incorrectly labelled images¶

To further interpret the test results and identify which images were incorrectly labelled, we displayed the images to see if that could help us in identifying a pattern. get_accuracy was used to display the images that were incorrectly labelled in the test dataset with the predicted label and the target label.

In [ ]:
def get_accuracy(model, data_loader, classes):
    correct = 0
    total = 0
    batch_size = data_loader.batch_size

    test_loader_orig = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
                                           num_workers=num_workers, shuffle=False)

    dataiter = iter(test_loader_orig)
    images, labels = next(dataiter)
    images = images.numpy()  # convert images to numpy for display

    fig = plt.figure(figsize=(25, 4))

    for i, (imgs, labels) in enumerate(data_loader):
        if use_cuda and torch.cuda.is_available():
            imgs = imgs.cuda()
            labels = labels.cuda()

        output = model(imgs)
        pred = output.max(1, keepdim=True)[1]
        is_correct = pred.eq(labels.view_as(pred))
        correct += is_correct.sum().item()
        total += imgs.shape[0]

        incorrect_indices = (~is_correct).nonzero(as_tuple=True)[0]
        print(incorrect_indices.shape)

        for idx in range(incorrect_indices.shape[0]):
            predicted_label = classes[pred[incorrect_indices[idx]][0]]
            actual_label = classes[labels[incorrect_indices[idx]]]
            print(f"Image Index: {incorrect_indices[idx]}, Predicted Label: {predicted_label}, Actual Label: {actual_label}")
            #ax = fig.add_subplot(327, 1, idx + 1, xticks=[], yticks=[])
            plt.imshow(np.transpose(images[incorrect_indices[idx]], (1, 2, 0)))
            #ax.set_title(actual_label)
            plt.show()

    return correct / total

class_labels = ['Baroque', 'Cubism', 'Minimalism', 'Popart', 'Ukiyo']
test_loader = torch.utils.data.DataLoader(test_data_features, batch_size=327, num_workers=num_workers, shuffle=False)
get_accuracy(model_an, test_loader, classes)
torch.Size([55])
Image Index: 0, Predicted Label: Ukiyo, Actual Label: Baroque
Image Index: 12, Predicted Label: Cubism, Actual Label: Baroque
Image Index: 20, Predicted Label: Ukiyo, Actual Label: Baroque
Image Index: 33, Predicted Label: Ukiyo, Actual Label: Baroque
Image Index: 57, Predicted Label: Popart, Actual Label: Baroque
Image Index: 65, Predicted Label: Popart, Actual Label: Baroque
Image Index: 93, Predicted Label: Popart, Actual Label: Cubism
Image Index: 95, Predicted Label: Popart, Actual Label: Cubism
Image Index: 97, Predicted Label: Popart, Actual Label: Cubism
Image Index: 101, Predicted Label: Minimalism, Actual Label: Cubism
Image Index: 112, Predicted Label: Ukiyo, Actual Label: Cubism
Image Index: 127, Predicted Label: Popart, Actual Label: Cubism
Image Index: 128, Predicted Label: Popart, Actual Label: Cubism
Image Index: 136, Predicted Label: Baroque, Actual Label: Cubism
Image Index: 138, Predicted Label: Popart, Actual Label: Minimalism
Image Index: 140, Predicted Label: Baroque, Actual Label: Minimalism
Image Index: 142, Predicted Label: Popart, Actual Label: Minimalism
Image Index: 159, Predicted Label: Cubism, Actual Label: Minimalism
Image Index: 175, Predicted Label: Popart, Actual Label: Minimalism
Image Index: 184, Predicted Label: Ukiyo, Actual Label: Minimalism
Image Index: 202, Predicted Label: Popart, Actual Label: Minimalism
Image Index: 210, Predicted Label: Cubism, Actual Label: Popart
Image Index: 212, Predicted Label: Baroque, Actual Label: Popart
Image Index: 213, Predicted Label: Cubism, Actual Label: Popart
Image Index: 215, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 216, Predicted Label: Cubism, Actual Label: Popart
Image Index: 217, Predicted Label: Cubism, Actual Label: Popart
Image Index: 220, Predicted Label: Minimalism, Actual Label: Popart
Image Index: 221, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 223, Predicted Label: Minimalism, Actual Label: Popart
Image Index: 230, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 236, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 238, Predicted Label: Cubism, Actual Label: Popart
Image Index: 241, Predicted Label: Cubism, Actual Label: Popart
Image Index: 243, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 244, Predicted Label: Minimalism, Actual Label: Popart
Image Index: 248, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 249, Predicted Label: Cubism, Actual Label: Popart
Image Index: 250, Predicted Label: Cubism, Actual Label: Popart
Image Index: 251, Predicted Label: Cubism, Actual Label: Popart
Image Index: 255, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 256, Predicted Label: Cubism, Actual Label: Popart
Image Index: 258, Predicted Label: Cubism, Actual Label: Popart
Image Index: 259, Predicted Label: Cubism, Actual Label: Popart
Image Index: 260, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 261, Predicted Label: Minimalism, Actual Label: Popart
Image Index: 262, Predicted Label: Cubism, Actual Label: Popart
Image Index: 265, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 268, Predicted Label: Cubism, Actual Label: Ukiyo
Image Index: 277, Predicted Label: Minimalism, Actual Label: Ukiyo
Image Index: 295, Predicted Label: Popart, Actual Label: Ukiyo
Image Index: 305, Predicted Label: Popart, Actual Label: Ukiyo
Image Index: 312, Predicted Label: Cubism, Actual Label: Ukiyo
Image Index: 322, Predicted Label: Popart, Actual Label: Ukiyo
Image Index: 324, Predicted Label: Minimalism, Actual Label: Ukiyo
torch.Size([0])
Out[ ]:
0.8367952522255193

The main pattern that we noticed from viewing all the misclassified images was that the incorrect predictions were mainly for 'Pop-art' and 'Cubism' classes. One of the reasons for this could be due to visual similarities in the two art styles. Both images use vibrant and primary colours, prominent straight and curvy lines, unconventional shapes, and fragmented compositions.

For Cubism in particular, this shows an area where the model could be confused. This confusion could stem from the two forms of Cubism that exist within the Cubism movement: Analytical Cubism and Synthetic Cubism. Analytical Cubism is defined by muted colors and complex planes, while Synthetic Cubism is defined by bright colors, simpler shapes, and even collaged elements [1]. The misclassification may be due to the styles having overlapping features. The misclassified Synthetic Cubist art had brighter colors and collage elements, which could be mistaken for the characteristics of pop art. On the other hand, if the artwork had a muted color palette and complex details, it could be misclassified as Baroque.

This suggests that while our model can identify broader art styles, we will need additional training data with more distinct labeling for the model to be able to identify sub-styles effectively.

To further analyze our results, we calculated a confusion matrix to see the true performance for the different classes. get_metrics_per_class function was used to calculate the F1, Precision, and Recall scores as shown in the equations below and plot the corresponding confusion matrix. TP=True Positive, TN=True Negative, FP=False Positive, FN=False Negative

$$Precision = \frac{TP}{TP+FP} \ $$$$Accuracy = \frac{TP+FN}{TP+FP+FN+TN}$$$$F1\ Score = 2\times\frac{Precision \times Recall}{Precision + Recall}$$
In [ ]:
def get_metrics_per_class(model, data_loader, class_labels):
    model.eval()
    y_true = []
    y_pred = []

    with torch.no_grad():
        for imgs, labels in data_loader:
            if use_cuda and torch.cuda.is_available():
                imgs = imgs.cuda()
                labels = labels.cuda()

            output = model(imgs)
            _, predicted = torch.max(output, 1)

            y_true.extend(labels.cpu().numpy())
            y_pred.extend(predicted.cpu().numpy())

    f1_scores = f1_score(y_true, y_pred, average=None)
    precision, recall, _, _ = precision_recall_fscore_support(y_true, y_pred, average=None)

    table_data = []
    headers = ["Class", "F1 Score", "Precision", "Recall"]


    for class_idx, class_label in enumerate(class_labels):
        table_data.append([class_label, round(f1_scores[class_idx], 3), round(precision[class_idx], 3), round(recall[class_idx], 3)])

    fig, ax = plt.subplots(figsize=(8, 4))
    ax.axis('off')

    table = ax.table(cellText=table_data, colLabels=headers, loc="center", cellLoc="center",
                     colColours=["#ADD8E6"] * len(headers))

    table.auto_set_font_size(False)
    table.set_fontsize(10)
    table.scale(1.5, 2)

    plt.show()

    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=class_labels, yticklabels=class_labels)
    plt.xlabel("Predicted")
    plt.ylabel("True")
    plt.title("Confusion Matrix")
    plt.show()

class_labels = ['Baroque', 'Cubism', 'Minimalism', 'Popart', 'Ukiyo']
num_classes = len(class_labels)
get_metrics_per_class(model_an, test_loader, class_labels)

Findings from the Confusion Matrix:

We can see that the confusion matrix reflects our findings from visualizing the incorrectly labelled data. Baroque, Minimalism, and Ukiyo have the most instances of the correctly labelled images which are shown in the diagonal entries of the confusion matrix (True Positive). Some Baroque images were confused with Popart and Ukiyo as shown in the off-diagonal entries of the confusion matrix (False Positive).

Our findings from visualizing the incorrectly labelled images that Cubism and Popart were the most misclassified classes were true. The confusion matrix shows that Cubism has the highest False Positives with Popart but Popart has the highest False Positives with Minimalism.

Findings from the Recall, Precision Scores:

Another finding is that Cubism has a low precision score but a high recall score. The low precision score tells us that the model is lenient in making positive predictions and as a result predicts a lot of positives including many false positives. This leads to having a higher recall score because the model is able to capture a significant portion of the true positive instances.

6 - Implementation on New Data¶

To demonstrate that the model is generalizable and can correctly identify unseen images, new paintings needed to be sourced from a third-party. The Art Gallery of Ontario (AGO) has an online collection with art from a variety of artists and styles. Five new paintings were collected to represent each of the classes in the model. The image URLs were taken directly from the AGO's website.

The AGO also provided the labels for the data. Searching "Baroque" or "Cubism" in there online database provided many examples that aligned with the classes we wished to identify.

In [ ]:
# Define the list of image URLs that will be used to prove the effectiveness of the model.
urls = [
    'https://dbi5a5cdy48wt.cloudfront.net/loris/co10/ago.6277.jp2/full/680,/0/default.jpg', # Baroque
    'https://dbi5a5cdy48wt.cloudfront.net/loris/co7/ago.48945.jp2/full/680,/0/default.jpg', # Pop_art
    'http://imagelicensing.ago.ca/internal/media/dispatcher/144133/preview', # Ukiyo
    'https://dbi5a5cdy48wt.cloudfront.net/loris/co5/6510.jp2/full/680,/0/default.jpg', # Cubism
    'https://dbi5a5cdy48wt.cloudfront.net/loris/co13/23127.jp2/full/680,/0/default.jpg' # Minimalism
]

labels = [
    0, # Baroque
    3, # Pop_art
    4, # Ukiyo
    1, # Cubism
    2 # Minimalism
]

6.1 - Helper Functions¶

A series of helper functions are defined to process the images and obtain the features and predictions. The helper functions are categorized as: functions that help train model and output the classifications, and functions that process the images from the URLs.

In [ ]:
def get_prediction(model, data):
    '''
    A function that obtains the predicited labels on the new data.

    :param model: The trained AlexNet model
    :param data: The data loader that contains the AlexNet features for the new paintings.

    :return pred_class: a list of the predicted classes (in terms of their names).
    '''

    correct = 0
    total = 0

    for imgs, labels in data:
        str_labels = idx_to_class(labels)
        print(str_labels)
        imgs = imgs.squeeze()
        #############################################
        #To Enable GPU Usage
        if use_cuda and torch.cuda.is_available():
            imgs = imgs.cuda()
            labels = labels.cuda()
        #############################################

        output = model(imgs)

        #select index with maximum prediction score
        pred = output.max(1, keepdim=True)[1]
        # print(pred)

        correct += pred.eq(labels.view_as(pred)).sum().item()

        total += imgs.shape[0]

        pred_class = idx_to_class(pred)

    return pred_class

def get_torch_vars(xs, ys, gpu=False):
    '''
    A function that converts inputs and labels to their torch equivalent.

    :param xs: the input data
    :param ys: the label data

    :returns xs, ys: the torch versions of the input and label

    '''

    xs = torch.from_numpy(xs).float()
    ys = torch.from_numpy(ys).float()
    if gpu:
        xs = xs.cuda()
        ys = ys.cuda()
    return xs, ys

def get_features(image):
    '''
    Obtains the AlexNet features

    :param image: The new image data (torch)
    :returns features_tensor: The AlexNet features for the specified image.
    '''

    features = alexnet.features(image)
    features_tensor = torch.from_numpy(features.detach().numpy())

    return features_tensor

def idx_to_class(pred):
    '''
    A function that converts the class indices to strings

    :param pred: the class indices returned by the model.
    :returns pred_class: a list of the string labels associated with the predictions.
    '''

    pred_class = []
    for p in pred:
        if p == 0:
            pred_class.append("Baroque")
        elif p == 1:
            pred_class.append("Cubism")
        elif p == 2:
            pred_class.append("Minimalism")
        elif p == 3:
            pred_class.append("Pop_art")
        elif p == 4:
            pred_class.append("Ukiyo")

    return pred_class
In [ ]:
from PIL import Image
import requests

def get_img(url_list):
    '''
    :param url_list: The image URLs from the AGO

    :returns a numpy array of the retrieved images
    '''
    images = []

    for url in url_list:
        image = Image.open(requests.get(url, stream=True).raw)
        image = image.resize((224, 224))
        image = np.transpose(np.array(image), (2, 0, 1))

        image = new_data_process(image)

        images.append(image)

    return np.array(images)

def new_data_process(xs, max_pixel=256.0):
    '''
    Normalizes the images
    '''
    xs = xs / max_pixel
    return xs

def visual(img):
    '''
    Prints out the list of images
    '''
    img = np.transpose(img[:5, :, :, :], [0, 2, 3, 1])

    for i in range(5):
        ax = plt.subplot(3, 5, i + 1)
        ax.imshow(img[i])
        ax.axis("off")
    plt.show()

6.2 - Results Visualization¶

The image URLs are process and the AlexNet features are extracted. As seen below, the model was able to successfully identify the art style for all five images provided, which indicates that the network is successful in classifying unseen data.

In [ ]:
images = get_img(urls)

visual(images)
labels_np = np.array(labels)

img_torch, label_torch = get_torch_vars(images, labels_np)
dataset = []

for i in range(len(images)):
  alex_img = get_features(img_torch[i])
  dataset.append((alex_img, label_torch[i]))

loader = torch.utils.data.DataLoader(dataset, batch_size=len(images),
                                     num_workers=num_workers, shuffle=False)

get_prediction(model_an, loader)
['Baroque', 'Pop_art', 'Ukiyo', 'Cubism', 'Minimalism']
Out[ ]:
['Baroque', 'Pop_art', 'Ukiyo', 'Cubism', 'Minimalism']

7 - Related Works¶

We found several projects in the realm of utilizing convolutional neural networks for the classification of artworks. One paper titled “Using Convolutional Neural Networks to Classify Art Genre” used CNN without transfer learning to classify art styles. Similar to our project, their model was also the most successful in identifying Baroque artworks among all the styles. Their model achieved 94% accuracy in identifying Baroque artworks with an overall test accuracy of 81% [2].

Another project we found was “Artist Identification with Convolutional Neural Networks”, where the authors tested a variety of models from a CNN to a ResNet-18 network with transfer learning. Their best result came from a network based on ResNet-18 pre-trained on ImageNet with transfer learning, yielding a test accuracy of 89.8% [3]. The projects all highlighted the potential of CNNs and transfer learning in the field of art classification and the challenges they face, particularly with limited data and unbalanced data across styles.

8 - Next Steps¶

To further improve the results of this project, there are some recommendations and next steps.

First, training on a larger dataset is key. Perhaps in the future, scaling up to 10-15 art styles with at least 300 images per style would improve the applicability of this project.

The AlexNet model could be improved by using more layers and convolutions, instead of the two-convolution-two-fully-connected architecture we used earlier. However, there are limitations on the number of layers since AlexNet relies on 224x224 pixel images.

The reliance on AlexNet can be circumvented by developing our own CNN model further. One challenge with this is that training from scratch requires much computational power and time. However, such a model is no longer constrained by the size of AlexNet.

9 - References¶

[1] Tate, “Cubism,” Tate. Accessed: Dec. 08, 2023. [Online]. Available: https://www.tate.org.uk/art/art-terms/c/cubism

[2] J. DuBois, “Using Convolutional Neural Networks to Classify Art Genre”.

[3] N. Viswanathan, “Artist Identification with Convolutional Neural Networks”.

In [23]:
%%shell
jupyter nbconvert --to html /content/MIE1517_team7_final_report.ipynb
[NbConvertApp] Converting notebook /content/MIE1517_team7_final_report.ipynb to html
[NbConvertApp] Writing 20706022 bytes to /content/MIE1517_team7_final_report.html
Out[23]: